20:00 From R for Data Science:
Writing a function has three big advantages over using copy-and-paste:
- You can give a function an evocative name that makes your code easier to understand.
- As requirements change, you only need to update code in one place, instead of many.
- You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
Function definition consists of assigning a function name with a “function” statement that has a comma-separated list of named function arguments, and a return expression. The function name is stored as a variable in the global environment.
function(argument1, argument2 = 3)
In this case, we can then run
{}return()We call it by using:
When the function is called, the variables are reassigned to function arguments to be used within the function and helps with the modular form.
Things to ask yourself:
{ } represents variable scoping: within each { }, if variables are defined, they are stored in a local environment, and is only accessible within { }. All function arguments are stored in the local environment. The overall environment of the program is called the global environment and can be also accessed within { }.*
ls() in the console:debugonce() function:This “privacy” in the local environment is to make functions modular - they are independent tools that does not depend on the status of the global environment.
Using the addFunction function, let’s see step-by-step how the R interpreter understands our code:
tidyversedata.frametidyverse functions work with bare column names
{{}} (curly-curly)data.frameOne good way to get started learning how to write functions is to write wrapper functions for other functions.
Wrapper functions are ways to call more complicated functions with default parameters. The underlying function may be very complicated and have a lot of parameters, but we can simplify using a function.
Say we always want the na.rm argument to be TRUE in mean(). We can define a wrapper function mean_na() like this:
Create a function, called add_and_raise_power in which the function takes in 3 numeric arguments. The function computes the following: the first two arguments are added together and raised to a power determined by the 3rd argument. The function returns the resulting value.
Here is a use case: add_and_raise_power(1, 2, 3) = 27 because the function will return this expression: (1 + 2) ^ 3.
Another use case: add_and_raise_power(3, 1, 2) = 16 because of the expression (3 + 1) ^ 2. Confirm with that these use cases work.
Create a function, called my_dim in which the function takes in one argument: a dataframe. The function returns the following: a length-2 numeric vector in which the first element is the number of rows in the dataframe, and the second element is the number of columns in the dataframe. Your result should be identical as the dim function. How can you leverage existing functions such as nrow and ncol?
Use case: my_dim(penguins) = c(344, 8)
Create your own tidyverse function that returns a group_by()/summarize() or mean() for two columns: 1 column should be the group_by() variable, and it should return the mean of the second column in the data.frame
Test out your function with penguins: